Supporting On-the-fly Provenance Tracking in Stream Processing Systems
نویسندگان
چکیده
A new class of data management systems that operate on highvolume streaming data is becoming increasingly important. As this kind of systems has to process unpredictable streaming data in real-time and deliver instantaneous responses, it becomes very difficult to precisely validate stream processing results in timely manner, verify stream computation that took place and investigate processing steps used to generate result data. Therefore, a mechanism that can precisely track provenance of data streams at execution time is crucial for confidence in the results produced by this kind of systems. This paper presents a novel on-the-fly stream provenance tracking mechanism that enables a collection of provenance queries to be performed dynamically without requiring provenance information to be stored persistently. The experimental results indicate that the impact of provenance collection on system performance is relatively small (7% overhead observed). In addition, our provenance solution offers low-latency processing (about 0.3 ms per additional component) with reasonable memory consumption.
منابع مشابه
Tracking Stream Provenance in Complex Event Processing Systems for Workflow-Driven Computing
Workflow-driven, dynamically adaptive e-Science is a form of scientific investigation often using a Service-Oriented Architecture (SOA) paradigm, designed to use large-scale computational resources on-the-fly to execute workflows consisting of parallel models, analysis, and visualization tasks. In the Linked Environments for Atmospheric Discovery (LEAD) project, with which our team is involved,...
متن کاملAdvances and Challenges for Scalable Provenance in Stream Processing Systems
While data provenance is a relatively well-studied topic in both the fields of databases and workflow systems, its support within stream processing systems presents a new set of challenges. Given the potentially high event rate of the input streams and the low processing latency requirements imposed by many streaming applications, capturing data provenance effectively in a stream processing sys...
متن کاملTowards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering
Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We...
متن کاملThe Case for Fine-Grained Stream Provenance
The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we firs...
متن کاملAssessing the Trustworthiness of Streaming Data
The notion of confidence policy is a novel notion that exploits trustworthi-ness of data items in data management and query processing. In this paper we address the problem of enforcing confidence policies in data stream management systems (DSMSs), which is crucial in supporting users with different access rights, processing confidence-aware continuous queries, and protecting the secure streami...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014